A Taxonomy of Dirty Time-Oriented Data

نویسندگان

  • Theresia Gschwandtner
  • Johannes Gärtner
  • Wolfgang Aigner
  • Silvia Miksch
چکیده

Data quality is a vital topic for business analytics in order to gain accurate insight and make correct decisions in many data-intensive industries. Albeit systematic approaches to categorize, detect, and avoid data quality problems exist, the special characteristics of time-oriented data are hardly considered. However, time is an important data dimension with distinct characteristics which affords special consideration in the context of dirty data. Building upon existing taxonomies of general data quality problems, we address ‘dirty’ time-oriented data, i.e., timeoriented data with potential quality problems. In particular, we investigated empirically derived problems that emerge with different types of time-oriented data (e.g., time points, time intervals) and provide various examples of quality problems of time-oriented data. By providing categorized information related to existing taxonomies, we establish a basis for further research in the field of dirty time-oriented data, and for the formulation of essential quality checks when preprocessing time-oriented data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Rule Based Taxonomy of Dirty Data

There is a growing awareness that high quality of data is a key to today’s business success and that dirty data existing within data sources is one of the causes of poor data quality. To ensure high quality data, enterprises need to have a process, methodologies and resources to monitor, analyze and maintain the quality of data. Nevertheless, research shows that many enterprises do not pay adeq...

متن کامل

Objects Identification in Object-Oriented Software Development - A Taxonomy and Survey on Techniques

Analysis and design of object oriented is onemodern paradigms for developing a system. In this paradigm, there are several objects and each object plays some specific roles. Identifying objects (and classes) is one of the most important steps in the object-oriented paradigm. This paper makes a literature review over techniques to identify objects and then presents six taxonomies for them. The f...

متن کامل

Declarative Semantics in Object-Oriented Software Development - A Taxonomy and Survey

One of the modern paradigms to develop an application is object oriented analysis and design. In this paradigm, there are several objects and each object plays some specific roles in applications. In an application, we must distinguish between procedural semantics and declarative semantics for their implementation in a specific programming language. For the procedural semantics, we can write a ...

متن کامل

Qualitative Data Cleaning

Data quality is one of the most important problems in data management, since dirty data often leads to inaccurate data analytics results and wrong business decisions. Data cleaning exercise often consist of two phases: error detection and error repairing. Error detection techniques can either be quantitative or qualitative; and error repairing is performed by applying data transformation script...

متن کامل

Data Quality in Very Large, Multiple-Source, Secondary Datasets for Data Mining Applications

The data mining research community is increasingly addressing data quality issues, including problems of dirty data. Hand, Blunt, Kelly and Adams (2000) have identified high-level and low-level quality issues in data mining. Kim, Choi, Hong, Kim and Lee (2003) have compiled a useful, complete taxonomy of dirty data that provides a starting point for research in effective techniques and fast alg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012